Lecture 36: Non-Parametric Statistical Tests

Chapter 27

Corrine Riddell (Instructor: Tomer Altman)

November 21, 2025

Concussions

In Pittsburgh, accomplished pathologist Dr. Bennet Omalu uncovers the truth about brain damage in football players who suffer repeated concussions in the course of normal play.

Concussion (IMDB)

Article on Concussions

From the article:

From the article

STATISTICAL ANALYSIS

Between-group comparisons of age, years of education, and MMSE scores were analyzed with Mann–Whitney U tests. Group differences in race were analyzed with the use of chi-square tests. For between-group comparisons of amyloid-beta plaque burden, chi-square tests were used to compare the proportion of participants with a positive florbetapir PET, and t-tests were used to compare the mean cortical:cerebellar florbetapir standard uptake value ratio (SUVR, the ratio of radioactivity in a cerebral region to that in the cerebellum as a reference) between the groups.

From the article

Roadmap

Parametric Testing:

Roadmap

But all of the methods we have looked at so far depend on some assumptions about the underlying distribution.

What have we assumed?

What do we do if our assumptions are violated?

Non-Parametric Testing

Non-Parametric Testing

Non-Parametric Testing

PROS: Non-parametric methods make very few assumptions about the variable(s) we samples or their distribution and thus rely less on “parameters”.

CONS: Non-parametric methods use less of the information offered in the data

Non-Parametric Testing

We will discuss non-parametric equivalents for:

Two sample t : Wilcoxon Rank-Sum

Paired t : Wilcoxon sign-rank

ANOVA: Kruskal-Wallis

Wilcoxon Two-Sample Tests

Frank Wilcoxon

In one paper in 1945 he proposed both the Wilcoxon rank-sum test and the Wilcoxon signed-rank test.

Wilcoxon Rank-Sum

Wilcoxon Rank-Sum

To calculate a rank sum test:

  1. The observations from both groups are ordered from lowest to highest and assigned the rank of their order

  2. If there are “tied” values, these are assigned the average of the ranks

    • E.g., if two observations have the same value and the next lower value has a rank of three, then the two observations are both given the rank of 4.5 (because they would have been ranks 4 and 5)
  3. Then the sum of ranks belonging to Group 1 are compared to the sum of ranks belonging to Group 2

Wilcoxon Rank-Sum

Values in group 1: 4, 3, 5, 2, 6

Values in group 2: 6, 5, 7, 4, 8

Wilcoxon Rank-Sum: Ranking

Number 2 3 4 4 5 5 6 6 7 8
Index 1 2 3 4 5 6 7 8 9 10
Rank 1 2 3.5 3.5 5.5 5.5 7.5 7.5 9 10
——– —– —– —– —– —– —– —-

Wilcoxon Rank-Sum: Summation

Group 1 rank Group 2 rank
4 3.5 6 7.5
3 2 5 5.5
5 5.5 7 9
2 1 4 3.5
6 7.5 8 10
——— —— ————— ———–
sum 19.5 sum 35.5

Wilcoxon Rank-Sum

The smaller of the two sums is called W, with size of \(n_S\), and the larger with size \(n_L\). This is then used in the following equation to generate a Z statistic.

\[Z_{w}=\frac{W-\mu_w}{\sigma_{w}}\] where

\[\mu_w=\frac{n_S(n_S+n_L+1)}{2}\] and

\[\sigma_{w}=\sqrt{\frac{n_Sn_L(n_S+n_L+1)}{12}}\]

Wilcoxon Rank-Sum

So from our example where group 1 had a rank sum of 19.5 and group 2 had a rank sum of 35.5

\[\mu_w=\frac{n_S(n_S+n_L+1)}{2}=\frac{5(5+5+1)}{2}=27.5\] and

\[\sigma_{w}=\sqrt{\frac{n_Sn_L(n_S+n_L+1)}{12}}=\sqrt{\frac{5\times5(5+5+1)}{12}}=4.8\]

\[Z_{w}=\frac{W-\mu_w}{\sigma_{w}}=\frac{19.5-27.5}{4.8}=-1.67\]

Wilcoxon Rank-Sum

The \(Z_{w}\) we generate follows an approximate standard Normal distribution. So we can use our Z score to get a p-value in R

2*pnorm(-1.67)
## [1] 0.09491936

Wilcoxon Rank-Sum in R

The general syntax will be:

wilcox.test(group1, group2, paired=F)

or

wilcox.test(outcome ~ group)

Remember that you can always type help(wilcox.test) in your console to get the full details.

Wilcoxon Rank-Sum example: Phenylketonuria

Normalized mental age scores for children with phenylketonuria (PKU):

Group 1: “low exposure” < 10.0 mg/dl

Group 2: “high exposure” >= 10.0 mg/dl

Wilcoxon Rank-Sum: Phenylketonuria

##   Group  nMA
## 1   low 34.5
## 2   low 37.5
## 3   low 39.5
## 4   low 40.0
## 5   low 45.5
## 6   low 47.0

Wilcoxon Rank-Sum: Phenylketonuria

In this example there 18 High and 21 Low exposure individuals.

group_by(pku,Group) %>%
  summarise(
    count = n(),
    median = median(nMA, na.rm = TRUE),
    IQR = IQR(nMA, na.rm = TRUE)
  )
## # A tibble: 2 × 4
##   Group count median   IQR
##   <chr> <int>  <dbl> <dbl>
## 1 high     18   48.2  9.12
## 2 low      21   51    7

Wilcoxon Rank-Sum: Phenylketonuria

If we graph the distributions with a density plot what do we notice?

ggplot(pku, aes(x = nMA)) + 
  geom_density(aes(fill = Group), alpha = 0.5) +
  theme_minimal(base_size = 15)

Wilcoxon Rank-Sum: Phenylketonuria

wilcox.test(nMA ~ Group, data=pku)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  nMA by Group
## W = 142, p-value = 0.1896
## alternative hypothesis: true location shift is not equal to 0

Check your understanding!

Wilcoxon Rank-Sum vs T: NHANES example

Here I will again use the NHANES data as an example, looking at height by gender:

# Read CSV into R
nhanes <- read.csv(file="./data/nhanes.csv", header=TRUE, sep=",")
names(nhanes)
##  [1] "ridageyr"  "agegroup"  "gender"    "military"  "born"      "citizen"  
##  [7] "drinks"    "drinkscat" "bmxwt"     "bmxht"     "bmxbmi"    "bmicat"   
## [13] "bpxpls"    "bpxsy1"    "bpxsy2"    "sys1d"     "sys2d"     "bpxdi1"   
## [19] "bpxdi2"    "dias1d"    "dias2d"    "bpcat"     "chest"     "fs1"      
## [25] "fs2"       "fs3"       "lbdhdd"    "hdlcat"    "highhdl"   "hi"       
## [31] "asthma"    "vwa"       "vra"       "va"        "aspirin"   "sleep"    
## [37] "is"        "hs"        "lbdldl"    "highldl"

Wilcoxon Rank-Sum vs T: NHANES example

Wilcoxon Rank-Sum vs T

ggplot(nhanes, aes(x = bmxht)) + 
  geom_density(aes(fill=gender), alpha=0.1) +
  theme_minimal(base_size = 15)

Wilcoxon Rank-Sum vs T

t.test(malesht, femalesht, paired=F)
## 
##  Welch Two Sample t-test
## 
## data:  malesht and femalesht
## t = 47.285, df = 2384, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  13.37441 14.53172
## sample estimates:
## mean of x mean of y 
##  174.4717  160.5186

Wilcoxon Rank-Sum vs T

wilcox.test(malesht,femalesht)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  malesht and femalesht
## W = 1402065, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Wilcoxon Rank-Sum vs T

Wilcoxon Sign Rank

Wilcoxon Sign Rank

Wilcoxon Sign Rank

Steps:

  1. Calculate the difference between each pair of observations

  2. Rank the difference by absolute value from smallest to largest (again, tied values get the average of the ranks). Any pair where there is no difference is thrown out.

  3. Record the “sign” of the difference (i.e., positive or negative )

  4. Take the sum of the positive ranks and the sum of the negative ranks (the smaller sum is denoted with a T)

Wilcoxon Sign Rank

Under the null hypothesis that the difference is zero, we would expect the sample to have equal numbers of positive and negative ranks with equivalent sums. This expectation is tested against the statistic \[Z_{T}=\frac{T-\mu_{T}}{\sigma_{T}}\]

Where \[\mu_{T}=\frac{n(n+1)}{4}\] and \[\sigma_{T}=\sqrt{\frac{n(n+1)(2n+1)}{24}}\]

Wilcoxon Sign Rank: Example Pre- and Post-Test

Time 1 Time 2
65 77
87 100
77 75
90 89
70 80
84 81
92 91
83 96
85 84
91 89
68 88
72 100
81 81
——— ——

Sign Rank Example

Sign Rank Example: Calculate Difference and Sign

Time 1 Time 2 Difference sign
65 77 12 +
87 100 13 +
77 75 -2 -
90 89 -1 -
70 80 10 +
84 81 -3 -
92 91 -1 -
83 96 13 +
85 84 -1 -
91 89 -2 -
68 88 20 +
72 100 18 +
81 81 0 ?
——— —— —- ——

Sign Rank Example: Sort by Absolute Value and Assign Rank

Time 1 Time 2 Difference sign rank
90 89 -1 - 2
92 91 -1 - 2
85 84 -1 - 2
77 75 -2 - 4.5
91 89 -2 - 4.5
84 81 -3 - 6
70 80 10 + 7
65 77 12 + 8
87 100 13 + 9.5
83 96 13 + 9.5
72 100 18 + 11
68 88 20 + 12
81 81 0 ? drop
——— —— —- ——

Sign Rank Example: Sum the Positive and Negative Ranks

Negative signs

Time 1 Time 2 Difference sign rank
90 89 -1 - 2
92 91 -1 - 2
85 84 -1 - 2
77 75 -2 - 4.5
91 89 -2 - 4.5
84 81 -3 - 6
——— —— —- ——

Sum of negative sign ranks is 21.

Sign Rank Example: Sum the Positive and Negative Ranks

Time 1 Time 2 Difference sign rank
70 80 10 + 7
65 77 12 + 8
87 100 13 + 9.5
83 96 13 + 9.5
72 100 18 + 11
68 88 20 + 12
——— —— —- ——

Sum of the positive sign ranks is 57.

Wilcoxon Sign Rank: Example

Our expectation would be: \[\mu_{T}=\frac{n(n+1)}{4}=\frac{12(12+1)}{4}=39\] Remember that we had 13 observations, but we dropped one because the values at times 1 and 2 were the same and \[\sigma_{T}=\sqrt{\frac{n(n+1)(2n+1)}{24}}=\sqrt{\frac{12(12+1)(2\times12+1)}{24}}=12.75\]

Wilcoxon Sign Rank: Example

And we compare our expectation to the smaller rank value (Sum of negative ranks was 21, sum of positive ranks was 57): \[Z_{T}=\frac{T-\mu_{T}}{\sigma_{T}}=\frac{21-39}{12.75}=-1.412\]

2*pnorm(-1.412)
## [1] 0.15795

Wilcoxon Rank-Sum in R

The general syntax will be:

wilcox.test(group1, group2, paired=T)

or

wilcox.test(Pair(group,outcome) ~ 1)

Wilcoxon Sign Rank: Example

wilcox.test(test1,test2,paired=T, correct=FALSE)
## Warning in wilcox.test.default(test1, test2, paired = T, correct = FALSE):
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(test1, test2, paired = T, correct = FALSE):
## cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test
## 
## data:  test1 and test2
## V = 21, p-value = 0.157
## alternative hypothesis: true location shift is not equal to 0

Wilcox Sign Rank: Compare to T

t.test(test1,test2,paired=TRUE)
## 
##  Paired t-test
## 
## data:  test1 and test2
## t = -2.3684, df = 12, p-value = 0.0355
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -12.7011701  -0.5295991
## sample estimates:
## mean difference 
##       -6.615385

Wilcox Sign Rank: Compare to T

With this study, our sample size is 13 (one pair is thrown out for having a difference of zero) and the distribution of changes looks like this:

hist(Change)

Non-parametric Test for Three or More Samples

Kruskal-Wallis

Kruskal-Wallis

Kruskal-Wallis

## 
##  Kruskal-Wallis rank sum test
## 
## data:  outcome by treatment
## Kruskal-Wallis chi-squared = 13.096, df = 3, p-value = 0.004434

Non-Parametric Summary

Samples Parametric Non Parametric
Two independent samples two sample t-test Wilcoxon rank sum
Two paired samples paired t-test Wilcoxon sign rank
Three or more samples ANOVA Kruskal-Wallis

Non parametrics in R

Samples test name R function
Two independent samples Wilcoxon rank sum wilcox.test(group1,group2,paired=F)
Two paired samples Wilcoxon sign rank wilcox.test(Pair(group1,group2) ~ 1)
Three or more samples Kruskal-Wallis kruskal.test(outcome ~ group)

Parting humor